A Ground Truth Bleed-Through Document Image Database
نویسندگان
چکیده
This paper introduces a new database of 25 recto/verso image pairs from documents suffering from bleed-through degradation, together with manually created foreground text masks. The structure and creation of the database is described, and three bleed-through restoration methods are compared in two ways; visually, and quantitatively using the ground truth masks.
منابع مشابه
Document perceptual quality ground truth creation
This article focuses on a new method for document perceptual quality ground truth creation. This type of ground truth gives a quality related score to each image in a dataset. This is useful for performance evaluation of algorithms that measure the quality of images. The quality of a document image is related to the amount of its degradations. To our knowledge, a methodology to create this kind...
متن کاملDocument Image Binarization
Principal stage of the document image analysis procedure is the binarization, according to which the pixels are classified into text and background. It is a crucial stage that can affect further stages including the final character recognition stage. This thesis is focused on document image binarization, including both binarization techniques and evaluation methodologies. Specifically, accordin...
متن کاملAutomatic Assessment of OCR Quality in Historical Documents
Mass digitization of historical documents is a challenging problem for optical character recognition (OCR) tools. Issues include noisy backgrounds and faded text due to aging, border/marginal noise, bleed-through, skewing, warping, as well as irregular fonts and page layouts. As a result, OCR tools often produce a large number of spurious bounding boxes (BBs) in addition to those that correspon...
متن کاملObjective Quality Measurement for Geometric Document Image Restoration
Many algorithms to remove distortion from document images have be proposed in recent years, but so far there is no reliable method for comparing their performance. In this paper we propose a collection of methods to measure the quality of such restoration algorithms for document image which show a non-linear distortion due to perspective or page curl. For the result from these measurement to be...
متن کاملPerformance Evaluation of Document Structure Extraction Algorithms
This paper presents a performance metric for the document structure extraction algorithms by finding the correspondences between detected entities and ground truth. We describe a method for determining an algorithm’s optimal tuning parameters. We evaluate a group of document layout analysis algorithms on 1600 images from the UW-III Document Image Database, and the quantitative performance measu...
متن کامل